Bootstrap Methods for the Cost-Sensitive Evaluation of Classifiers
نویسندگان
چکیده
Many machine learning applications require classi ers that minimize an asymmetric cost function rather than the misclassi cation rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is di cult to tell whether these new cost-sensitive methods are better than existing methods that ignore costs, and it is also di cult to tell whether one cost-sensitive method is better than another. To rectify this problem, this paper presents two statistical methods for the cost-sensitive setting. The rst constructs a con dence interval for the expected cost of a single classi er. The second constructs a condence interval for the expected di erence in costs of two classi ers. In both cases, the basic idea is to separate the problem of estimating the probabilities of each cell in the confusion matrix (which is independent of the cost matrix) from the problem of computing the expected cost. We show experimentally that these bootstrap tests work better than applying standard z tests based on the normal distribution.
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملPerformance Evaluation of Machine Learning Classifiers in Sentiment Mining
In recent years, the use of machine learning classifiers is of great value in solving a variety of problems in text classification. Sentiment mining is a kind of text classification in which, messages are classified according to sentiment orientation such as positive or negative. This paper extends the idea of evaluating the performance of various classifiers to show their effectiveness in sent...
متن کاملBootstrap Methods for the Cost - Sensitive Evaluation of Classi ersDragos
Many machine learning applications require classiiers that minimize an asymmetric cost function rather than the misclassiication rate, and several recent papers have addressed this problem. However, these papers have either applied no statistical testing or have applied statistical methods that are not appropriate for the cost-sensitive setting. Without good statistical methods, it is dii-cult ...
متن کاملA parametric model for predicting cut point of hydraulic classifiers
A new parametric model was developed for predicting cut point of hydraulic classifiers. The model directly uses operating parameters including pulp flowrate, feed particle size characteristics, pulp solids content, solid density and particles retention time in the classification chamber and also covers uncontrollable errors using calibration constants. The model applicability was first verified...
متن کاملBayesian Methods for the Evaluation of Classifiers
This paper presents a Bayesian approach to estimating the risk (or the expected loss) of classifiers, and discusses some experimental results and the issues that have to be considered when assessing the risk of classifiers. The development of the proposed methodology was motivated by the shortcomings observed in employing the bootstrap tests of Margineantu and Dietterich [10] especially when ap...
متن کامل